NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Analyses of 600+ insect genomes reveal repetitive element dynamics and highlight biodiversity-scale repeat annotation challenges

https://doi.org/10.1101/gr.277387.122

Sproul, John S.; Hotaling, Scott; Heckenhauer, Jacqueline; Powell, Ashlyn; Marshall, Dez; Larracuente, Amanda M.; Kelley, Joanna L.; Pauls, Steffen U.; Frandsen, Paul B. (October 2023, Genome Research)

Repetitive elements (REs) are integral to the composition, structure, and function of eukaryotic genomes, yet remain understudied in most taxonomic groups. We investigated REs across 601 insect species and report wide variation in RE dynamics across groups. Analysis of associations between REs and protein-coding genes revealed dynamic evolution at the interface between REs and coding regions across insects, including notably elevated RE–gene associations in lineages with abundant long interspersed nuclear elements (LINEs). We leveraged this large, empirical data set to quantify impacts of long-read technology on RE detection and investigate fundamental challenges to RE annotation in diverse groups. In long-read assemblies, we detected ∼36% more REs than short-read assemblies, with long terminal repeats (LTRs) showing 162% increased detection, whereas DNA transposons and LINEs showed less respective technology-related bias. In most insect lineages, 25%–85% of repetitive sequences were “unclassified” following automated annotation, compared with only ∼13% inDrosophilaspecies. Although the diversity of available insect genomes has rapidly expanded, we show the rate of community contributions to RE databases has not kept pace, preventing efficient annotation and high-resolution study of REs in most groups. We highlight the tremendous opportunity and need for the biodiversity genomics field to embrace REs and suggest collective steps for making progress toward this goal.
more » « less
Full Text Available
Long Reads Are Revolutionizing 20 Years of Insect Genome Sequencing

https://doi.org/10.1093/gbe/evab138

Hotaling, Scott; Sproul, John S; Heckenhauer, Jacqueline; Powell, Ashlyn; Larracuente, Amanda M; Pauls, Steffen U; Kelley, Joanna L; Frandsen, Paul B (August 2021, Genome Biology and Evolution)
Hoffmann, Federico (Ed.)
Abstract The first insect genome assembly (Drosophila melanogaster) was published two decades ago. Today, nuclear genome assemblies are available for a staggering 601 insect species representing 20 orders. In this study, we analyzed the most-contiguous assembly for each species and provide a “state-of-the-field” perspective, emphasizing taxonomic representation, assembly quality, gene completeness, and sequencing technologies. Relative to species richness, genomic efforts have been biased toward four orders (Diptera, Hymenoptera, Collembola, and Phasmatodea), Coleoptera are underrepresented, and 11 orders still lack a publicly available genome assembly. The average insect genome assembly is 439.2 Mb in length with 87.5% of single-copy benchmarking genes intact. Most notable has been the impact of long-read sequencing; assemblies that incorporate long reads are ∼48× more contiguous than those that do not. We offer four recommendations as we collectively continue building insect genome resources: 1) seek better integration between independent research groups and consortia, 2) balance future sampling between filling taxonomic gaps and generating data for targeted questions, 3) take advantage of long-read sequencing technologies, and 4) expand and improve gene annotations.
more » « less
Full Text Available
Pathways to polar adaptation in fishes revealed by long‐read sequencing

https://doi.org/10.1111/mec.16501

Hotaling, Scott; Desvignes, Thomas; Sproul, John S.; Lins, Luana S. F.; Kelley, Joanna L. (June 2022, Molecular Ecology)

Abstract Long‐read sequencing is driving a new reality for genome science in which highly contiguous assemblies can be produced efficiently with modest resources. Genome assemblies from long‐read sequences are particularly exciting for understanding the evolution of complex genomic regions that are often difficult to assemble. In this study, we utilized long‐read sequencing data to generate a high‐quality genome assembly for an Antarctic eelpout,Ophthalmolycus amberensis, the first for the globally distributed family Zoarcidae. We used this assembly to understand howO. amberensishas adapted to the harsh Southern Ocean and compared it to another group of Antarctic fishes: the notothenioids. We showed that selection has largely acted on different targets in eelpouts relative to notothenioids. However, we did find some overlap; in both groups, genes involved in membrane structure, thermal tolerance and vision have evidence of positive selection. We found evidence for historical shifts of transposable element activity inO. amberensisand other polar fishes, perhaps reflecting a response to environmental change. We were specifically interested in the evolution of two complex genomic loci known to underlie key adaptations to polar seas: haemoglobin and antifreeze proteins (AFPs). We observed unique evolution of the haemoglobin MN cluster in eelpouts and related fishes in the suborder Zoarcoidei relative to other Perciformes. For AFPs, we identified the first species in the suborder with no evidence ofafpIIIsequences (Cebidichthys violaceus) in the genomic region where they are found in all other Zoarcoidei, potentially reflecting a lineage‐specific loss of this cluster. Beyond polar fishes, our results highlight the power of long‐read sequencing to understand genome evolution.
more » « less
De novo whole genome assemblies of Agrypnia vestita Walker, and Hesperophlax magnus Banks reveal substantial repetitive element expansion in tube case-making caddisflies (Insecta: Trichoptera)

https://doi.org/https://doi.org/10.1093/gbe/evab013

Olsen, Lindsey K; Heckenhauer, Jacqueline; Sproul, John S; Dikow, Rebecca B; Gonzalesz, Vanessa L; Kweskin, Matthew P; Taylor, Adam M; Wilson, Seth B; Stewart, Russell J; Zhou, Xin; et al (January 2021, Genome biology and evolution)
null (Ed.)
Full Text Available
RepeatProfiler: A pipeline for visualization and comparative analysis of repetitive DNA profiles

https://doi.org/10.1111/1755-0998.13305

Negm, Sherif; Greenberg, Anya; Larracuente, Amanda M.; Sproul, John S. (January 2021, Molecular Ecology Resources)

Abstract Study of repetitive DNA elements in model organisms highlights the role of repetitive elements (REs) in many processes that drive genome evolution and phenotypic change. Because REs are much more dynamic than single‐copy DNA, repetitive sequences can reveal signals of evolutionary history over short time scales that may not be evident in sequences from slower‐evolving genomic regions. Many tools for studying REs are directed toward organisms with existing genomic resources, including genome assemblies and repeat libraries. However, signals in repeat variation may prove especially valuable in disentangling evolutionary histories in diverse non‐model groups, for which genomic resources are limited. Here, we introduce RepeatProfiler, a tool for generating, visualizing, and comparing repetitive element DNA profiles from low‐coverage, short‐read sequence data. RepeatProfiler automates the generation and visualization of RE coverage depth profiles (RE profiles) and allows for statistical comparison of profile shape across samples. In addition, RepeatProfiler facilitates comparison of profiles by extracting signal from sequence variants across profiles which can then be analysed as molecular morphological characters using phylogenetic analysis. We validate RepeatProfiler with data sets from ground beetles (Bembidion), flies (Drosophila), and tomatoes (Solanum). We highlight the potential of RE profiles as a high‐resolution data source for studies in species delimitation, comparative genomics, and repeat biology.
more » « less

Search for: All records